[00:00.000 --> 00:06.440]  So, yeah, I'm here presenting today on Cloud Native Attack Detection and Simulation.
[00:07.280 --> 00:11.400]  It's an expansion of a talk I did at Forward CloudSec, and we've spoken previously on some
[00:11.400 --> 00:15.060]  of this at some other conferences, too. But this is the first time I've had the opportunity,
[00:15.060 --> 00:19.540]  really, to gather it all together into one big presentation. So I'm really happy to be here.
[00:19.540 --> 00:24.720]  Hopefully, everyone will get something out of it. So there's a few key things to go through
[00:24.720 --> 00:29.880]  today, really. The first thing is that I want to spend some time kind of comparing and contrasting
[00:30.000 --> 00:36.660]  on-premise detection and cloud detection, what the key differences are between them, and some
[00:37.260 --> 00:41.800]  critical architectural differences, but also just in terms of what an attacker's likely to try
[00:41.800 --> 00:48.620]  versus what they did on-premise, how a lot of that varies. Yeah, exactly, digging into the
[00:48.620 --> 00:54.160]  likely attacker activity, we're going to see what that actually comes out as on the cloud side of
[00:54.160 --> 00:59.520]  things, what you should be looking for, where you should be looking for it, and so on. And lastly,
[00:59.520 --> 01:04.480]  the simulation piece, sort of buzzword is this is the idea of bringing DevOps to detection.
[01:04.660 --> 01:09.320]  We've seen a real strong uptake in a lot of the DevOps tooling in the cloud space,
[01:09.320 --> 01:14.400]  because it makes everything easier if you can automate stuff. So we've spent quite a lot of
[01:14.400 --> 01:19.740]  time building out some tooling that we've open-sourced to do just that, and we'll talk
[01:19.740 --> 01:25.580]  through that in some more detail. But just before we get started, I'm sure pretty much everyone's
[01:25.580 --> 01:30.800]  seen some sort of threat actor pyramid. I've got a slightly different take on it to what you often
[01:30.800 --> 01:34.500]  see, in that rather than putting sort of China and Russia at the top and a bunch of script kiddies
[01:34.500 --> 01:40.320]  at the bottom, we're talking more about capabilities than we are about individual groups.
[01:40.440 --> 01:45.420]  So what we've got at the top is bespoke tactics, techniques, and procedures, things that attackers
[01:45.420 --> 01:50.220]  are developing and designing specifically for whatever campaign they're on. And the middle
[01:50.220 --> 01:54.460]  bracket, custom tooling, is where attackers are in a position where perhaps they're not
[01:54.460 --> 01:58.500]  developing their own ODAs or doing their own security research, but they are building their
[01:58.500 --> 02:03.420]  own implants, their own tools, and so on. And sort of the bottom end of the pile is people grabbing
[02:03.420 --> 02:08.500]  tools off GitHub and firing them off willy-nilly across the world. And the reason I bring
[02:08.500 --> 02:14.640]  this up is because, to me, the interesting part of detection sits at the top end there, at the bespoke
[02:14.640 --> 02:19.900]  TTPs, the custom tooling. And realistically, that's also a very expensive place to sit if you're
[02:19.900 --> 02:26.260]  trying to build detective capability. And so a lot of what I do is around driving the costs down
[02:26.260 --> 02:34.360]  or making it easier for organizations and for people to pick up the top end, the advanced
[02:34.360 --> 02:38.780]  end of detection, and use it within their organization. And that will come up quite a
[02:38.780 --> 02:46.240]  few times through today's talk. So first off, on-premise versus cloud. So there's quite a few
[02:46.240 --> 02:50.620]  key differences here, really, when it comes to attack detection. And one of the biggest is just
[02:50.620 --> 02:56.160]  in terms of the telemetry that we've got. So on-premise, historically, the vast majority of the
[02:56.160 --> 03:03.640]  useful telemetry and data that we'd be threat hunting in sat around the endpoint. So we're
[03:03.640 --> 03:08.360]  running endpoint detection response products, you know, pick your favorite from CrowdStrike, Tanium,
[03:08.360 --> 03:13.480]  Cyberese, and those sorts of things. And so we'd be running those on all of our endpoints, probably
[03:13.480 --> 03:20.580]  our servers too, and gathering information like process execution trees, you know, what's
[03:20.580 --> 03:25.320]  happening in memory, looking for specific syscalls and Windows API calls that might be malicious.
[03:25.580 --> 03:29.760]  And we do a lot of our detection on that. If you've paid dark trace or vector a lot of money,
[03:29.760 --> 03:34.300]  you might also have some network telemetry, but it's usually a secondary source. Likewise, with
[03:34.300 --> 03:39.340]  data you might be getting out of your applications. The one key exception there is I'm going to class
[03:39.340 --> 03:45.440]  Active Directory as an application in this little diagram here. Active Directory, obviously, is one
[03:45.440 --> 03:50.160]  of the big points we can get a lot of useful detection information out of on-premise, especially
[03:50.160 --> 03:54.000]  when it comes to sort of lateral movement and privilege escalation. But by and large, still,
[03:54.000 --> 03:58.860]  you know, we're focused in really on the endpoints of the servers. Now, when we go to cloud, we've
[03:58.860 --> 04:02.240]  still got all of that to some degree, especially if you're still running virtual machines in a lot
[04:02.240 --> 04:07.280]  of your cloud environments, you know, you're running containers, we've got container security
[04:07.280 --> 04:11.660]  monitoring solutions now that work with that. But actually, we've now got this box that sits around
[04:11.660 --> 04:17.320]  it all, which I'm going to call control plane telemetry. So that is data that we're getting out
[04:17.320 --> 04:24.400]  of the cloud itself and from a given cloud provider about what's being done from an administrative
[04:24.400 --> 04:28.980]  perspective inside the cloud account that you're looking at, or the Azure subscription or Google
[04:28.980 --> 04:35.060]  Cloud projects about who's creating, modifying, destroying what resources, where, when, and how
[04:35.060 --> 04:42.020]  they're doing that. And so that really actually forms then the core of your detective capability
[04:42.020 --> 04:46.920]  in the cloud. Attackers are still going to do some of what they used to on the endpoints
[04:46.920 --> 04:52.180]  and at the network level. But really, a lot of the threats we've seen, and in fact, a lot of
[04:52.180 --> 04:56.580]  the successful attacks, Capital One springs to mind, you know, happened against the control plane
[04:56.580 --> 05:03.240]  where an attacker's got hold of some credentials for the AWS account, the Azure subscription,
[05:03.240 --> 05:10.920]  and they're targeting that. Now, what comes with that is just a dearth of knowledge, really,
[05:10.920 --> 05:15.100]  in terms of what that actually looks like. We've had sort of 20, 30 years to build up an
[05:15.100 --> 05:19.520]  understanding of what an attack on premise looks like. We know what an attacker is going to try
[05:19.520 --> 05:24.980]  on a Windows network full of, you know, Windows laptops connected to an Active Directory
[05:24.980 --> 05:28.980]  infrastructure. It's all pretty well understood. And new techniques come out all the time.
[05:29.020 --> 05:33.080]  But at the end of the day, the broad attack vectors and a lot of the stuff we should be
[05:33.080 --> 05:39.980]  looking for are all pretty consistent, or at least reasonably well known, even if not
[05:39.980 --> 05:45.000]  always that easy to spot. The problem then comes when we move to the cloud. Not only are we
[05:45.000 --> 05:48.740]  dealing with more platforms than we used to, but also we just don't really have an idea of what
[05:48.940 --> 05:53.580]  a lot of this stuff looks like now. Attackers aren't targeting workstations and virtual
[05:53.580 --> 06:03.060]  machines. We're targeting serverless functions, Kubernetes clusters, the control plane. And so,
[06:03.060 --> 06:08.660]  what that looks like. What an attacker is likely to try. And it's just nowhere near as well
[06:08.660 --> 06:14.940]  understood. That said, from my experiences working in this space, I think there's a few key things
[06:14.940 --> 06:22.180]  to call out, really. One of which is that there's this concept of uncertainty of malicious intent.
[06:22.660 --> 06:27.620]  On premise, there's a lot of stuff that's very malicious and it's easy to signature off. That
[06:27.620 --> 06:31.440]  doesn't really exist in the cloud. We'll get into that in a bit more detail in a second.
[06:32.020 --> 06:37.480]  But one of the other things that comes with it is that context is key. Understanding what an
[06:37.480 --> 06:41.580]  attacker is doing in a specific environment rather than just what they're doing in general
[06:41.580 --> 06:47.080]  in a Windows estate. These things vary an awful lot in the cloud, whereas they didn't historically
[06:47.080 --> 06:50.860]  on premise to anything like the same degree. And again, get into that more in a second.
[06:51.020 --> 06:54.140]  And one of the big wins from a detection perspective in the cloud is that visibility
[06:54.140 --> 07:00.280]  tends to be a lot easier. And so, if you look at Google Cloud, for instance, we can turn on
[07:00.280 --> 07:05.840]  audit logs. In fact, they're turned on by default. And we can feed those into a Google Cloud
[07:05.840 --> 07:11.760]  organization-wide log sync and get all of the logs for all of our projects at the control plane
[07:11.760 --> 07:16.280]  layer fed straight into a single location and ship it from there off into our scene.
[07:16.440 --> 07:21.380]  And there's no more sort of, we've got our EDR on 90% of our laptops, but there's these 10%
[07:21.380 --> 07:25.740]  of legacy Windows 2000 boxes that we can't do anything with. There's nothing like that.
[07:26.340 --> 07:31.560]  The problem comes, actually, that now shadow IT is even more of a problem than it used to be,
[07:31.560 --> 07:35.440]  because it used to be that if you had shadow IT in your on-premise environments, you could probably
[07:35.440 --> 07:39.940]  still walk around to a switch and have a look at what was plugged into it. And now you've got
[07:39.940 --> 07:45.260]  pretty much zero visibility of marketing walking out to one of their third-party suppliers and
[07:45.260 --> 07:50.000]  saying, hey, build us a website for this marketing campaign. Here's an AWS account that we registered
[07:50.000 --> 07:54.300]  yesterday with one of our credit cards. Keeping track of that's really, really hard. And actually,
[07:54.300 --> 07:58.140]  one of the tricks we've seen a few organizations use is to get really friendly with the finance
[07:58.140 --> 08:03.820]  department and ask finance to essentially report all transactions for the cloud providers. And then
[08:03.820 --> 08:07.540]  you can chase down which teams are doing what that way. It's a bit of a low-tech solution by
[08:07.540 --> 08:14.340]  our usual approaches, but it works pretty well. And lastly, while we've got this fantastic ability
[08:14.340 --> 08:20.820]  to automate lots of things in the cloud now with all the APIs they expose, the problem now is that
[08:20.820 --> 08:25.940]  we're not the only ones leveraging that, right? Attackers are also automating a lot of what they
[08:25.940 --> 08:33.440]  do. The classic example here, again, with AWS is attackers leveraging access keys that are linked
[08:33.440 --> 08:40.000]  to GitHub for Bitcoin mining. You can subscribe to a public feed of commits to public GitHub
[08:40.000 --> 08:45.000]  repositories, and attackers tend to feed that straight into essentially their own continuous
[08:45.000 --> 08:49.540]  delivery platforms, which monitor the commits to public repositories, look for things that look
[08:49.540 --> 08:55.400]  like AWS access keys, and feed those straight into a system that will then deploy Bitcoin miners in
[08:55.400 --> 09:01.380]  the account of the compromised creds. It's gotten bad enough now that the meantime to compromise
[09:01.380 --> 09:07.160]  between keys hitting GitHub and Bitcoin mining startings, about 20 to 30 seconds, and Amazon
[09:07.160 --> 09:12.860]  themselves started monitoring that same feed and revoking keys when they find them. So while
[09:12.860 --> 09:15.820]  obviously that's pretty low-tech and in the grand scheme of things not that interesting,
[09:16.640 --> 09:19.800]  the fact that we're seeing that automated makes me think that in the long run
[09:19.800 --> 09:23.860]  we're going to start seeing attackers building out tooling and capabilities that focus on
[09:23.860 --> 09:27.680]  getting in and doing as much damage or as much of what they want to do as rapidly as possible
[09:28.240 --> 09:30.920]  before the defenders have woken up and realize there's a problem.
[09:32.600 --> 09:38.380]  And so I mentioned mindset shift here. We have this idea of uncertainty of malicious intent.
[09:38.380 --> 09:42.680]  So there are some things that we know are obviously bad. If you've got Mimikatz running on a system,
[09:42.680 --> 09:45.820]  that's probably not supposed to be happening. Likewise, someone accessing
[09:45.820 --> 09:51.300]  LSAS memory or, you know, there's a variety of fairly common things that are obviously bad and
[09:51.300 --> 09:56.540]  we can signature on them, right? That's how antivirus has been working for the last 20 years.
[09:56.940 --> 10:02.740]  Some things are usually bad. HTA spawning PowerShell is a classic example there, but
[10:02.740 --> 10:08.560]  actually that's not always bad. We were doing some threat hunting on one organization's estates
[10:08.560 --> 10:13.940]  and found that when we ran a query for PowerShell being spawned by MSHTA files,
[10:13.940 --> 10:18.720]  we got like 15,000 hits or something. So we immediately thought there'd been a major
[10:18.720 --> 10:22.160]  compromise and started digging into it. And it turns out that actually Hewlett-Packard
[10:22.160 --> 10:28.080]  enterprise printer drivers for a certain set of models work by, or the driver installations work
[10:28.080 --> 10:34.480]  by using an MSHTA file to spawn PowerShell to do the installation. So it's probably bad,
[10:34.480 --> 10:39.880]  it might not be. And then you've also got things like users requesting Kerberos tickets for
[10:39.880 --> 10:45.240]  services they've not spoken to before, you know, network comms that isn't normal, but doesn't
[10:45.240 --> 10:49.480]  immediately stick out as malicious and things that bear further investigation rather than things that
[10:49.480 --> 10:53.860]  you immediately signature on. The problem in cloud is that there are very few things that are
[10:53.860 --> 11:00.840]  obviously or usually bad. There's a few, but there's not many. The vast majority of what I'd
[11:00.840 --> 11:05.320]  expect an attacker to do and what I see people doing when they're doing offensive security
[11:05.320 --> 11:12.080]  testing revolves around abusing legitimate functionality, API calls that the cloud
[11:12.080 --> 11:19.620]  providers expose to, you know, install persistence mechanisms, escalate their privileges, access
[11:19.620 --> 11:24.480]  data they shouldn't be, all of these kinds of things. And so everything becomes about context.
[11:25.780 --> 11:32.820]  So let's say we've got a, we're in AWS, we've got an IAM user, we've had some access keys added to
[11:32.820 --> 11:38.820]  that IAM user. In principle, that might be okay, that might not. So we dig into it. It turns out
[11:38.820 --> 11:44.540]  that was made, or that change was made by a user account or a set of access keys associated with
[11:44.540 --> 11:49.640]  some kind of continuous delivery pipeline that we're running on premise. Okay, that's probably
[11:49.640 --> 11:55.500]  fine. You know, actually that might be expected in this environment. But if we've got a change
[11:55.500 --> 12:00.100]  made by an administrator from a weird network location, maybe without his 2FA that you'd
[12:00.100 --> 12:04.840]  usually be using or something else, that actually might not be. And so we might want to further
[12:04.840 --> 12:10.300]  investigate. But that might well be entirely dependent on the environment. And there are two
[12:10.300 --> 12:14.980]  things that will come out of this. One of which is that over time attackers are going to become
[12:14.980 --> 12:19.880]  more and more context aware. I'm already seeing that sort of continuous integration, continuous
[12:19.880 --> 12:24.480]  delivery systems are favorite targets of a lot of attackers. I'll get into that in a bit.
[12:24.940 --> 12:29.680]  And so being able to masquerade as sort of known trusted entities will become more of a thing over
[12:29.680 --> 12:36.840]  time. But equally, that then presents a real problem for managed detection and response
[12:36.840 --> 12:43.140]  providers or MSSPs who are trying to monitor a wide variety of client estates. Because there's
[12:43.140 --> 12:48.220]  no such thing anymore as, you know, we detect known bad and that's great. We've got to be able
[12:48.220 --> 12:54.400]  to monitor on a per environment basis, not even per customer or client. And so that's where things
[12:54.400 --> 12:58.520]  like user behavior analytics starts becoming really useful. You know, is this behavior normal
[12:58.520 --> 13:03.120]  for this environment, for this specific entity? Or is this something that we've never seen before?
[13:03.380 --> 13:07.280]  Even just things like least frequency analysis, you know, for this environment, how often do we
[13:07.280 --> 13:11.700]  see this particular API call made? You know, if it's never been called before, then that might
[13:11.700 --> 13:16.280]  warrant some investigation. If it happens 500 times a day, then well, it's probably fine.
[13:17.480 --> 13:21.620]  Equally, the other thing that comes out of this is that there are some things that just don't
[13:21.620 --> 13:28.120]  exist from on premise in terms of actions and attackers likely to perform. And so let's say,
[13:28.120 --> 13:34.820]  again, we're working in AWS, we've got an attacker who comes in and they run IAM create policy version
[13:34.820 --> 13:41.020]  to create a new version of the policy document for one of the policies that's attached to them.
[13:41.020 --> 13:46.600]  And so this is changing the permissions they've got by altering the permissions definitions,
[13:46.600 --> 13:52.280]  basically. And actually, if you can do create policy version on any resource, and you have
[13:52.280 --> 13:57.500]  policies attached to you, then that essentially gives you admin access. There's a concept that
[13:57.500 --> 14:01.340]  doesn't really exist to the same degree on premise. So there's a number of these sorts of actions
[14:01.340 --> 14:05.520]  where we don't really have an on premise equivalent. And these are kind of new techniques that
[14:05.520 --> 14:07.560]  defenders are going to have to get their heads around.
[14:08.780 --> 14:12.960]  This is then massively complicated by the fact that your average enterprise really isn't
[14:12.960 --> 14:18.320]  one cloud provider with a few cloud accounts these days, right? I can't remember the last
[14:18.320 --> 14:24.020]  time I went to a client and they said, we're just in AWS or we're just in Azure, and it turned out
[14:24.020 --> 14:30.480]  to be true. F-Secure is a great example for that. In fact, we use AWS very heavily, but we have got
[14:30.480 --> 14:35.700]  bits of Google Cloud and Azure lying around as well for various reasons. But even if you're
[14:35.700 --> 14:41.620]  just looking at that, that's a fairly complex thing to deal with in and of itself. Multiple
[14:41.620 --> 14:46.260]  cloud providers and completely different technology stacks and different log formats, all of that kind
[14:46.260 --> 14:51.140]  of thing. When you then expand that to include everything else that exists in most organizations,
[14:51.140 --> 14:55.260]  cloud estates these days, you know, we're probably pulling container images down from Docker Hub,
[14:55.260 --> 15:00.360]  storing our source code in GitHub or something like that. We're probably using Slack for
[15:00.360 --> 15:06.780]  comms and we've got Office 365 for our emails and all of these are fairly juicy targets for
[15:06.780 --> 15:11.340]  an attacker in one respect or another. And they all need monitoring to some degree, which presents
[15:11.500 --> 15:14.400]  a pretty big problem for a lot of organizations really, just getting their heads around all of
[15:14.400 --> 15:21.900]  this. So when it comes to actually designing how you do detection in the cloud, I think it would
[15:21.900 --> 15:27.040]  be worth just pausing for a second and talking through a few key observations that we've made
[15:27.700 --> 15:33.740]  about how to successfully try and integrate sort of that much of a disparate estate, you know,
[15:33.900 --> 15:40.600]  a wide variety of technology platforms. And the big one really is to centralize everything.
[15:41.000 --> 15:45.820]  Have a single scene that handles your on-premise, your cloud, your everything, ship all of your
[15:45.820 --> 15:52.160]  logs into it from everywhere, you know, all into the same single place. One thing that we've
[15:52.160 --> 15:57.740]  definitely noticed is that if you have a security operations center or a threat hunting team,
[15:57.740 --> 16:03.160]  what have you, who are monitoring quite a disparate estate, the more things you make
[16:03.160 --> 16:07.380]  them go look at, the more likely it is that some things are going to get missed or people
[16:07.380 --> 16:11.700]  aren't really paying as much attention to one thing as the others. So put it all in one place.
[16:11.860 --> 16:14.640]  And the other thing you'll note here with this diagram is that for each of the major cloud
[16:14.640 --> 16:19.340]  providers, there's like one account that we're shipping logs off to before they go into the scene.
[16:19.720 --> 16:23.980]  And there's a few big benefits to this, really. One of which is that if we're shipping all of
[16:23.980 --> 16:28.960]  our logs to a centralized location, we can lock that down to some degree so that only our security
[16:28.960 --> 16:34.640]  people can access it, which means that all the logs, even when they're in the cloud still,
[16:34.640 --> 16:38.020]  are in a place that an attacker probably doesn't immediately have access to if they've compromised
[16:38.020 --> 16:42.920]  one of the project's accounts. And it means that it's then relatively straightforward as well for
[16:43.380 --> 16:47.480]  defenders if, let's say, there's some issues with the scene or they need more data than that's
[16:47.480 --> 16:52.220]  actually being shipped off to the scene for various reasons. Spunk charging per gigabyte
[16:52.220 --> 16:56.880]  is usually one of the examples there behind why that doesn't happen. You might need more data.
[16:56.880 --> 17:02.060]  So you go into this one location for each cloud provider, and that's where that data lives.
[17:03.380 --> 17:07.860]  And there's also a few key data sources to be looking at here. And I'm going to pick a couple
[17:07.860 --> 17:12.960]  out to dig into in detail. But you'll notice from this table, the control plane audit logs
[17:12.960 --> 17:17.200]  have bolded at the top there. And there's a very good reason for that. And that is that,
[17:17.200 --> 17:22.840]  in my experience, something like probably 70-80% of attacker activity is going to show up in those
[17:22.840 --> 17:27.340]  control plane logs when we're looking at the cloud. Just because even if an attacker is targeting
[17:27.340 --> 17:30.700]  virtual machines or what have you, they're eventually likely to want to pivot out into
[17:30.700 --> 17:35.540]  the underlying control plane layer and start messing around with other resources in the
[17:35.540 --> 17:40.000]  environment, data stores, or escalate their privileges or anything like that. That's all
[17:40.000 --> 17:47.700]  going to come down through the control plane. So for AWS, you've got CloudTrail, you've got audit
[17:47.700 --> 17:56.120]  log in both Azure and GCP, and actually Kubernetes has that for the Kubernetes API too. And as we
[17:56.120 --> 18:02.460]  say here, this logs almost every control plane level event. I say almost because people keep
[18:02.460 --> 18:08.560]  proving me wrong with that with AWS CloudTrail. We've seen a few people now digging around trying
[18:08.560 --> 18:12.620]  to find API calls that aren't correctly logged, and there are a few here and there. AWS do their
[18:12.620 --> 18:17.960]  best to catch them, but I'm sure that same issue will exist in Azure and GCP. But by and large,
[18:17.960 --> 18:23.200]  you're going to get logging for pretty much everything that happens. The key thing to
[18:23.200 --> 18:27.720]  remember here is to make sure that you have it all turned on properly. AWS is particularly bad
[18:27.720 --> 18:31.460]  for this in that CloudTrail has a number of different knobs and settings you can tweak
[18:31.940 --> 18:39.320]  that vary what it logs and how it logs it. So often we find that either multi-region logging
[18:39.320 --> 18:43.360]  hasn't been turned on, which means you're only getting data for US East 1 or wherever it is
[18:43.360 --> 18:47.480]  you've got the trail deployed. And equally, if you don't have global events enabled, then you
[18:47.480 --> 18:52.080]  don't get anything for IAM or a couple of other critical services. So it's well worth digging
[18:52.080 --> 18:55.520]  into your provider's documentation and working out exactly what you should have turned on there
[18:55.520 --> 19:03.300]  and making sure you've got that. Now, Cloud Native Detection Services. So GuardDuty is
[19:03.300 --> 19:07.640]  probably the big name here, but you've also got Advanced Threat Protection from Azure and
[19:08.780 --> 19:12.800]  Security Command Center from Google has some of this built into its tool.
[19:13.000 --> 19:19.000]  And what we've got here is automatic detection of lower sophistication attacker activity.
[19:19.620 --> 19:23.780]  And what I mean by that is this is essentially sort of antivirus for cloud. We're spotting known
[19:23.780 --> 19:29.700]  bad things like network communications to known malware command and control addresses.
[19:30.220 --> 19:37.360]  In the case of GuardDuty, it'll detect when someone tries to run commands from AWS API commands
[19:37.360 --> 19:42.180]  from Kali Linux if they haven't changed the user agent on the basis that obviously everyone using
[19:42.180 --> 19:47.700]  Kali Linux must be bad, so therefore we should be firing alerts for it. And while they have their
[19:47.700 --> 19:53.440]  flaws, actually they're still a pretty cost effective way of detecting sort of low complexity,
[19:53.440 --> 20:00.360]  low sophistication attacks. So I would always turn it on. I just wouldn't rely on it. You know,
[20:00.360 --> 20:03.040]  you need more than this to be able to spot more advanced attackers.
[20:04.580 --> 20:09.380]  I'm also leaving out Sentinel when we're talking about this. So Sentinel is Microsoft's scene
[20:09.380 --> 20:16.300]  that they go with or they pair with Azure. And actually Sentinel's pretty capable by and of
[20:16.300 --> 20:21.760]  itself. And there's a pretty great team of people building some pretty serious capability into that
[20:21.760 --> 20:25.080]  that goes well beyond what we've seen in GuardDuty and what they've got in the
[20:25.080 --> 20:30.580]  advanced threat protection package. And so if you're in Azure and you've not got anything like
[20:30.580 --> 20:34.900]  this set up yet, I'd definitely be looking quite hard at Sentinel. We've been pretty impressed with
[20:34.900 --> 20:41.980]  it where we've been running into it so far. Service specific telemetry is the last one I
[20:41.980 --> 20:46.940]  want to dig into here when we're talking about log sources. So here we're talking about the kind
[20:46.940 --> 20:52.000]  of logs you get out of things like storage accounts, access and object logs, executions
[20:52.000 --> 20:59.000]  of serverless functions, audit logs from your key management system keys, this sort of thing where
[20:59.000 --> 21:04.100]  we've got specific telemetry about what's happening inside a specific service or set of resources.
[21:05.000 --> 21:10.120]  This is a bit of an interesting one because everything here is going to be case by case.
[21:10.520 --> 21:14.680]  Sometimes this data could be extremely valuable. Sometimes it could be basically useless.
[21:15.280 --> 21:20.280]  So you're going to end up needing to tune this on a per environment basis and choose what to turn
[21:20.280 --> 21:24.600]  on and where, you know, as relevant depending on what use you think you're going to get out
[21:24.600 --> 21:32.780]  of the data. So one good example here is S3 access logs in AWS. If we've got an S3 bucket that's
[21:32.780 --> 21:39.580]  doing nothing but hosting images and CSS files for a static website, we really don't care who's
[21:39.580 --> 21:44.300]  accessing it. On the other hand, if you have your organization's most important data in some S3
[21:44.300 --> 21:49.040]  buckets, it's probably well worth having those logs turned on so that you can spot who's accessing
[21:49.040 --> 21:55.000]  them where, when, and how. So, yeah, this one takes a bit more thought. But once you've got,
[21:55.000 --> 22:01.080]  like, the key main sources turned on, the audit logs, that sort of thing, this is well worth
[22:01.080 --> 22:08.660]  looking into, I think. So next up, one of the real problems we've got at the moment is having
[22:08.660 --> 22:15.400]  talked now about what data you can get to detect attacks with, we've got to work out what an attack
[22:15.400 --> 22:19.380]  actually looks like, right? I've spoken a little bit about that to some degree, but we've got a
[22:19.380 --> 22:23.580]  real problem actually at the moment just in terms of getting hold of data to show what attackers
[22:23.580 --> 22:28.800]  are actually doing in the cloud. So I've been talking on and off to MITRE's cloud team about
[22:28.800 --> 22:34.080]  this for the MITRE attack framework, and they're doing their best, but they're really struggling
[22:34.080 --> 22:37.740]  to get their hands on decent threat intel, and I think the rest of the industry's in the same boat.
[22:37.740 --> 22:43.500]  So this is the on-premise attack framework on the left, and the cloud one on the right, for those
[22:43.500 --> 22:48.420]  who aren't familiar with MITRE attack, it's a taxonomy of likely attacker tools, techniques,
[22:48.420 --> 22:54.140]  and procedures, TTPs, things that attackers likely to try in a given environment. So the one on the
[22:54.140 --> 22:57.520]  left, as I say, is Windows. As you can see, there's a lot of things in there that we've
[22:57.520 --> 23:04.200]  noted attackers doing over time. For cloud, much less so, as you can see on the right there.
[23:04.760 --> 23:09.940]  And so if we were to purely rely on what we know attackers are doing actively,
[23:10.920 --> 23:13.980]  we're barely going to spot anything, because there's just not very much information out there
[23:13.980 --> 23:19.560]  about this. And where we see reports of, oh, we think an attacker might be doing X, Y, and Z,
[23:19.560 --> 23:23.480]  it's actually often really hard to validate that. You can't just sort of jump onto Google and dig
[23:23.480 --> 23:29.040]  out an old Mandiant or CrowdStrike PDF, you know, from the last APT29 report or whatever,
[23:29.040 --> 23:32.440]  and there's a lot less information sharing on this. And part of that, I think, is actually the
[23:32.440 --> 23:37.560]  cloud provider's fault. I have to say, we've not had great experiences working with some of them
[23:37.560 --> 23:44.460]  on this. They're always very keen to hide any evidence of sort of real malicious activity,
[23:44.460 --> 23:48.300]  just because it's potentially damaging to their reputation. But that doesn't help us
[23:48.300 --> 23:53.820]  from a defensive perspective. So that said, you know, there are a few places you can draw
[23:53.820 --> 23:58.040]  inspiration from, I think, for working out what an attacker is likely to do in your environment.
[23:58.840 --> 24:03.500]  Certainly Scott Piper at Summit Root is going to be one of the big ones for AWS,
[24:03.500 --> 24:07.360]  and the guys over at Rhino Security have been putting out some great research for years in
[24:07.360 --> 24:12.120]  this space. If you're a fan of Google Cloud, then GitLab's internal red team have done some
[24:12.120 --> 24:16.040]  pretty fantastic stuff there. And there's a variety of tools, too, that are pretty useful
[24:16.040 --> 24:24.600]  for working this out. AWS especially, Paku from Rhino, is full of modules of things that an
[24:24.600 --> 24:29.280]  attacker might want to do. And so given how powerful it is, you know, we use it pretty
[24:29.280 --> 24:33.120]  frequently when we're doing offensive security testing of one kind or another. It's pretty
[24:33.120 --> 24:39.220]  likely that a lot of the lower skilled attackers or people using public tooling are going to be
[24:39.220 --> 24:44.420]  taking Paku and trying to do stuff with it. So if you can signature on most of what Paku's doing,
[24:44.420 --> 24:48.020]  or at least be in a position where you've got the right data to be able to spot it if it does
[24:48.020 --> 24:53.720]  happen, that puts you in a pretty decent place. So go have a look at a lot of the existing tooling
[24:53.720 --> 24:58.500]  that's out there, and see what penetration testers say they're doing, you know, read around
[24:58.500 --> 25:04.440]  on the blog posts they publish, and base a lot of your detective activity on what looks like it's
[25:04.440 --> 25:08.500]  most relevant from what you see that way. And that's a pretty great way, in my experience,
[25:08.500 --> 25:13.400]  to expand beyond sort of MITRE ATT&CK and the other common industry frameworks.
[25:14.420 --> 25:18.560]  But even going beyond that, we can take a look at a bit of a high level as to what an attacker's
[25:18.560 --> 25:24.880]  actually likely to do. So there's four key areas, really, that I think I see
[25:26.660 --> 25:31.360]  us, other pen testing firms, red teams, and so on, exploiting on engagements.
[25:31.560 --> 25:37.440]  The big ones really are identity management, pivoting around between environments,
[25:37.440 --> 25:42.080]  source code management, continuous delivery, and application vulnerabilities.
[25:42.740 --> 25:48.460]  You know, we'll dig into those one after the other. So first up, identity management exploitation.
[25:48.460 --> 25:53.580]  So here what we've got is where we've got an attacker who's managed to get hold of some
[25:53.580 --> 25:58.840]  credentials in some fashion or another, either from access keys they've stolen out of source
[25:58.840 --> 26:03.260]  code or what have you, service account details that way, or they're targeting a single sign-on
[26:03.260 --> 26:07.880]  mechanism, they found some credentials somewhere and they're doing that. What we end up with
[26:07.880 --> 26:12.480]  is an attacker leveraging credentials to get into the environment and start doing things from there.
[26:12.480 --> 26:18.860]  So this is pretty common. Verizon's 2020 data breach investigation report reckons that
[26:19.420 --> 26:26.220]  somewhere in the region of about 80% of cloud breaches involved lost or stolen credentials.
[26:26.480 --> 26:29.400]  So this really is the big ones we're paying attention to.
[26:31.020 --> 26:35.280]  This also comes up in another context, which I'll get into in a minute, but actually
[26:37.080 --> 26:40.900]  making sure that you've got sensible procedures around how you handle credentials, enforcing
[26:40.900 --> 26:46.360]  multi-factor authentication and so on is really important. The other thing to note here is that
[26:46.360 --> 26:51.040]  these credentials are often found in what seem like fairly stupid locations.
[26:52.520 --> 26:58.800]  Performing penetration tests and stuff, we've often found them in file shares, SharePoint sites,
[26:58.800 --> 27:04.520]  source code repositories to a degree too. People leave them lying around. And so that can be
[27:04.520 --> 27:09.180]  pretty problematic. But even in cases where people haven't left the credentials lying around,
[27:09.180 --> 27:16.740]  actually, a lot of the single sign-on tools that are used, a lot of the ways we do identity
[27:16.740 --> 27:21.500]  management in the cloud aren't as well understood as some of the legacy on-premise equivalents.
[27:21.500 --> 27:27.000]  So people make stupid mistakes, or say stupid mistakes. People make mistakes through lack of
[27:27.000 --> 27:32.020]  education, lack of understanding of what's going on. In this particular case, we had an organization
[27:32.020 --> 27:37.940]  deploy AWS Cognito, but pretty much by default and with the default configurations enabled.
[27:38.580 --> 27:43.860]  For those who aren't familiar, Cognito is a single sign-on user management tool that's designed to be
[27:43.860 --> 27:48.120]  integrated into web applications. So the idea is that you can use this to do your user management,
[27:48.120 --> 27:51.000]  so you don't have to implement that in your own web app, and there's an AWS service that
[27:51.000 --> 27:56.680]  takes care of it for you. You can also plug it into a few AWS services natively. So in this
[27:56.680 --> 28:02.700]  particular case, they'd used it as the front end for a fairly large data store. And because
[28:02.700 --> 28:08.020]  everything had been left at default, you had open registration enabled, so anyone could come
[28:08.020 --> 28:11.780]  along and register an account. It's what you want if you've got a public-facing web application,
[28:11.780 --> 28:17.120]  probably not if it's the authentication mechanism for a sensitive data store. And then it turns out
[28:17.120 --> 28:23.020]  that by default, if you don't configure multiple user account groups, then everyone goes into the
[28:23.260 --> 28:28.540]  default group, and the default group is administrator in the data storage thing that we were looking at.
[28:28.540 --> 28:33.620]  So we essentially just registered ourselves an account coming in from the internet and had admin
[28:33.620 --> 28:38.000]  access to this big data store. And so the important thing to note here is they'd not
[28:38.000 --> 28:42.020]  deliberately misconfigured it. They'd not even accidentally misconfigured it. They'd left it as
[28:42.820 --> 28:47.780]  default, and it turns out the defaults had really bitten quite badly in the back side in this case.
[28:49.440 --> 28:54.780]  Next up, pivoting around between environments. So this is quite a common one we see.
[28:54.780 --> 28:59.980]  Sorry, and we find that it's really common, obviously, for attackers to go in with phishing
[28:59.980 --> 29:05.920]  campaigns, either for phishing for credentials or getting a foothold on a network with, you know,
[29:05.920 --> 29:10.220]  pick your favorite impromptu, Cobalt Strike, Covenant, whatever, and pivot around from there,
[29:10.220 --> 29:17.700]  move up into the cloud. And so actually, that can be a pretty effective route. We're seeing it an
[29:17.700 --> 29:23.320]  awful lot that way around. We have also seen a few cases where attackers have somehow breached
[29:23.320 --> 29:28.940]  the cloud environment, either for an application vulnerability or something else. And they've been
[29:28.940 --> 29:34.320]  able to pivot around inside the cloud until they get to, say, an S3 bucket or some other data store
[29:34.320 --> 29:40.760]  that's got something sensitive in it. And in a couple of cases, we found either sort of SSH keys
[29:40.760 --> 29:47.160]  that allowed us to pivot back on premise over the VPNs they had set up in the cloud, or credentials
[29:47.160 --> 29:53.020]  to sort of third party management solutions that they were using, that we were then able to go in
[29:53.020 --> 29:57.060]  and manipulate what was happening there. So actually, the danger here is sort of in both
[29:57.060 --> 30:01.740]  directions, as much as not in the diagram. You can either start on premise and pivot into the cloud,
[30:01.740 --> 30:06.460]  or you can start on cloud and pivot on premise. And you'd probably pivot between cloud environments
[30:06.460 --> 30:11.060]  as well. Not something I've ever done personally, but I'm sure it'd be feasible. And so as an
[30:11.060 --> 30:16.060]  example, one thing I've seen pulled off on a number of occasions now, why I generally recommend
[30:16.060 --> 30:22.260]  not using Active Directory for your cloud single sign-on, is that we've seen cases where an
[30:23.100 --> 30:27.060]  organization's on-premise Active Directory has been compromised, or we'd compromised it as part
[30:27.060 --> 30:33.120]  of a red team. We've got ourselves in, we've added some users that we have access to, to the
[30:33.120 --> 30:39.920]  right groups in Active Directory to get cloud access, AWS, Azure, what have you. And then by
[30:39.920 --> 30:44.300]  virtue of having that, we then become administrators in the cloud, at which point we've owned the cloud
[30:44.300 --> 30:50.040]  as well. So either it's well worth investing in hardening your Active Directory, if you are
[30:50.040 --> 30:56.080]  going to do Active Directory single sign-on to the cloud, and especially Azure AD. Or equally,
[30:56.080 --> 31:00.880]  we've seen a number of organizations now go out and buy Ping, Okta, some other third-party single
[31:00.880 --> 31:05.200]  sign-on, and use that for their cloud access, completely separate to all their on-prem stuff.
[31:05.400 --> 31:10.160]  But it's definitely worth segregating or separating your Active Directory environment
[31:10.160 --> 31:15.520]  and your cloud environments as much as possible, given how weak most legacy AD environments are.
[31:16.900 --> 31:23.160]  Next up, source code management and continuous delivery. And so here, this is where we've got an
[31:23.160 --> 31:27.220]  attacker who's targeting code repositories or the pipelines that are deploying these things into the
[31:27.220 --> 31:32.560]  cloud. We're manipulating code being deployed as applications, or we're manipulating infrastructure
[31:32.560 --> 31:37.240]  as code that controls the cloud environment itself. And if we can commit code, we can have
[31:37.240 --> 31:41.760]  the pipeline push it out. If we can manipulate the pipeline, we can modify the code as it goes
[31:41.760 --> 31:47.080]  through the pipeline. Essentially, your cloud is as secure as the things deploying the applications
[31:47.080 --> 31:51.460]  and infrastructure. And that's something I think often gets overlooked. It's really important to
[31:51.460 --> 31:58.760]  also take a look at your source code repositories and pipelines. Next up, application exploitation.
[31:58.760 --> 32:04.360]  So we've got an attacker coming in from the web, and we're going back to sort of the early days of
[32:04.360 --> 32:08.300]  application security, where SQL injection and remote code execution was something you found
[32:08.300 --> 32:13.440]  on a regular basis. Obviously, it's a lot less common now. But we've seen new classes of
[32:13.440 --> 32:17.840]  vulnerabilities crop up, or we've seen things like server-side request forgery become more relevant
[32:17.840 --> 32:22.980]  now than they used to be on premise, thanks to instance metadata services and so on. But
[32:23.580 --> 32:28.880]  being able to pivot into the cloud via an application attack is something that is obviously
[32:28.880 --> 32:33.620]  very useful to an attacker, but it's something that gets exploited a fair bit, too, in my experience.
[32:33.620 --> 32:39.340]  It's usually vulnerabilities in commercial off-the-shelf components, or open source apps
[32:39.340 --> 32:43.220]  that are being deployed that have been misconfigured, in my experience, and we see a lot of that.
[32:44.680 --> 32:49.780]  But generally speaking, obviously, this is still a significant threat, and that's why
[32:49.780 --> 32:56.140]  companies still keep investing in AppSec teams. On the plus side, though, when it comes to
[32:56.140 --> 32:59.640]  application exploitation, it's generally a lot easier to spot this stuff in the cloud than it
[32:59.640 --> 33:05.080]  was on premise, or at least a lot easier to gather the right data for it. In this case,
[33:05.080 --> 33:10.840]  we're running an Azure Kubernetes cluster, we've got some applications running in, and by the
[33:10.840 --> 33:14.800]  time the traffic hits the application, we actually exploit it, it's passed through probably a web
[33:14.800 --> 33:19.260]  application firewall on the front, we've got some kind of API gateway, we've got the ingress
[33:19.260 --> 33:24.700]  controller going into the Kubernetes cluster, we've got the logs from the pod, from the container that
[33:24.700 --> 33:29.480]  the app's running in, we've got whatever application logs are being generated, and we can ship all of
[33:29.480 --> 33:38.180]  those out into our scene, and we can monitor and react to that. So there's quite a lot more we can
[33:38.180 --> 33:43.760]  do here than on premise. Also worth noting, if you do decide to go down the route of detecting
[33:43.760 --> 33:48.800]  against things happening in your applications, you should probably focus in on looking at
[33:48.800 --> 33:53.520]  authenticated traffic. I'm sure we're all aware that most applications on the internet get
[33:53.960 --> 33:59.620]  bombarded with SQL map scans and all that sort of stuff. The easiest way to discard all that
[33:59.620 --> 34:03.980]  and focus in on the interesting data is by looking at what your authenticated users are doing.
[34:08.140 --> 34:12.100]  So we talked about a variety of different things that an attacker is likely to try,
[34:12.100 --> 34:15.600]  and this is essentially the paths we go down when we're looking to work out what we should
[34:15.600 --> 34:19.760]  be spotting in a given environment. And threat model your environment, and work out what an
[34:19.760 --> 34:23.180]  attacker is likely to try, how they're going to do it, what the likely attack paths are,
[34:23.640 --> 34:27.140]  prioritize those based on what you think is most likely to be exploited, either because it's easy
[34:27.140 --> 34:33.500]  or because it's the most obvious or whatever else. Then define those attack paths, ideally in some
[34:33.500 --> 34:40.040]  sort of code format so it's easy to share around or machine read. Verify that the right logs,
[34:40.040 --> 34:43.740]  telemetry is available to defenders, and then go in and do some actual testing to see whether
[34:43.740 --> 34:48.920]  if we fire these test cases off, we get the right log data, we get something that we can action.
[34:51.650 --> 34:55.890]  But if you're not familiar enough, really, with your cloud environments to be able to jump in and
[34:55.890 --> 35:03.490]  do this, or you don't have the ability to do it at the scale your organization is working at,
[35:03.490 --> 35:07.890]  you've got to pick somewhere to start. And one of the easiest places, some of the biggest quick
[35:07.890 --> 35:14.030]  wins, are around either the objective end of the kill chain, where an attacker is looking to get
[35:14.030 --> 35:18.810]  access to key data sources or to start ransomware-ing everything or whatever else, and look
[35:18.810 --> 35:25.570]  at what those are, put key monitoring around that, but also look at infrequent actions that are
[35:25.570 --> 35:29.430]  generally highly privileged or affect your security model. Things like anything that
[35:29.430 --> 35:34.290]  involves turning off or altering the telemetry you're getting, modifying cloud trail, turning
[35:34.290 --> 35:40.890]  guard duty off in AWS, changing your audit log configs in Azure, or anything around manipulating
[35:40.890 --> 35:49.350]  your permissions model, your IAM policies in AWS, RBAC in Azure, anything that's modifying
[35:49.350 --> 35:54.490]  any of that. Those will make really good points to build out some initial detection around.
[35:57.890 --> 36:03.170]  So let's say now we've got some detection that we've built, and we think we've got some use
[36:03.170 --> 36:08.310]  cases implemented to spot some stuff. How do we validate those? As I say, by this point,
[36:08.310 --> 36:13.070]  we should have a set of likely attack paths identified, so we know what we're testing for,
[36:13.070 --> 36:20.330]  we've got some use cases for that. We then go and execute those attack paths, usually manually,
[36:20.330 --> 36:25.410]  and we then review whether we actually fired off some alerts or not, and then we work out what we
[36:25.410 --> 36:31.970]  need to do next based on how much of our detection actually worked as expected. Now, some of this,
[36:31.970 --> 36:36.870]  actually, we can quite easily automate, in theory at least, certainly executing the attack paths
[36:36.870 --> 36:41.390]  where it's targeting the control plane layer or we're targeting things like Kubernetes that are
[36:41.390 --> 36:47.070]  designed to be automated. It's generally fairly easy to automate the attack paths as well.
[36:47.070 --> 36:51.190]  One of the more interesting ones is doing an automated gap analysis, essentially spotting
[36:51.190 --> 36:55.490]  what fired that should have done, what didn't fire, and so on and so forth, so that we can
[36:55.490 --> 37:00.970]  essentially track over time in an automated way our improvements, what's working, what's not.
[37:02.370 --> 37:06.790]  And so these are all great ideas, but actually the industry is still mostly working on a manual
[37:06.790 --> 37:13.750]  basis for all of this, especially in the cloud. So one thing I think, one really important lesson
[37:13.750 --> 37:19.270]  to take away from the rest of the cloud community is treat everything as code. DevOps guys have
[37:19.270 --> 37:24.110]  really got this down, and we're starting to see the security industry get on board as well in the
[37:24.110 --> 37:28.470]  detection space. And the big reason for that, really, is that if all of your detection is
[37:28.470 --> 37:32.770]  defined as code, it makes it much easier both to share knowledge internally, because anyone can
[37:32.770 --> 37:37.190]  just take a look at your rules, and if they're in a common format that they already understand,
[37:37.190 --> 37:40.150]  you know, some people have written some new ones, they can just go in and they can start reading it
[37:40.150 --> 37:44.910]  and get their heads around it. And if we have reasonably common code formats, it also makes it
[37:44.910 --> 37:50.130]  much easier to share. And so if we can share information on how to spot, you know, what given
[37:50.130 --> 37:55.590]  threat groups are up to, specific TTPs, and we can either post those on GitHub or share them between
[37:55.590 --> 38:00.910]  sort of some of the closed threat intel sharing groups, it means that everyone's detection gets
[38:00.910 --> 38:05.110]  better for comparatively minimal effort on the part of an individual organization.
[38:05.390 --> 38:11.470]  A few big things to pick out in this space, Sigma by Florian Roth. So we've got a set of
[38:11.470 --> 38:17.330]  seam agnostic rules, essentially, that there's then a compiler for that produces your Splunk or
[38:17.330 --> 38:24.050]  your QRadar or your Elasticsearch queries off the back of this common rule set. So someone
[38:24.050 --> 38:29.810]  writes a Sigma rule for a particular TTP, and then you have the ability to use that in every
[38:29.810 --> 38:34.690]  platform going pretty much. There's a lot of compilers in there. The other interesting one
[38:34.690 --> 38:39.030]  in this space is Jupyter Notebooks. So this is more sort of threat hunting, you know, proactive
[38:39.030 --> 38:47.390]  investigation rather than reactive alerting based on known TTPs. But the idea of a Jupyter Notebook,
[38:47.390 --> 38:51.530]  for those who aren't familiar, is it's essentially a way of having a document with Python embedded
[38:51.530 --> 38:59.770]  in it. And so you can run code inside your documents, basically, and tie explanations of
[38:59.770 --> 39:04.290]  what's happening and the data that an analyst is seeing, and directly to the executing code,
[39:04.290 --> 39:10.490]  which makes it quite a nice way to share information, share threat hunts. And the Spectrops
[39:10.490 --> 39:14.850]  guys in particular have been doing some fun stuff with that. But it's starting to take off now. It's
[39:15.150 --> 39:20.910]  a data science thing, but it's been quite nicely repurposed now. And if you're looking for another
[39:20.910 --> 39:26.830]  conference talk to watch, John Lambert's the GitHubification of InfoSec from T2 last year.
[39:26.830 --> 39:31.250]  It was a really interesting presentation on how the industry needs to work better together
[39:31.250 --> 39:38.650]  to start making it more expensive for attackers to compromise organizations by sharing knowledge
[39:38.650 --> 39:46.640]  better than we do right now. Now, one of the interesting problems we've had historically
[39:46.640 --> 39:51.820]  in the cloud is that a lot of the existing solutions for automatically testing attacker
[39:51.820 --> 39:57.160]  activity don't extend to the cloud. They're entirely focused on on-premise at the moment.
[39:57.160 --> 40:02.980]  So things like MITRE Caldera, Atomic Red Team, there's a number of tools out there that have
[40:02.980 --> 40:08.160]  been very successful at automating attack simulation on-premise estates. But actually,
[40:08.160 --> 40:14.100]  we didn't have that for the cloud. So I went away and built one called Leonidas, which we've
[40:14.100 --> 40:23.340]  open-sourced. So Leonidas is essentially a web API where we have a variety of different
[40:23.340 --> 40:28.760]  attack definitions defined within it, and each API endpoint is a test case you can execute.
[40:29.220 --> 40:34.320]  So under the hood, actually, the way it works, we took a slightly different approach to how a lot
[40:34.320 --> 40:39.980]  of these tools work in that it's been designed to make it as easy as possible for you to share
[40:39.980 --> 40:45.180]  the ability to execute these test cases and to detect them between your team members internally
[40:45.180 --> 40:49.540]  and within the wider community, too. So let's say we've got a security team, we've got some
[40:49.540 --> 40:54.460]  analysts, we've got some red teams, purple teams, whatever, and one of them's read a new blog post
[40:54.460 --> 40:58.880]  about some new TTP in the cloud that they're interested in. So they sit down and they define
[40:58.880 --> 41:05.040]  this in our format. We've got like a YAML-based definitions format. And you write the TTP,
[41:05.040 --> 41:11.140]  it gets checked into a Git repository. We then have a CICD pipeline that essentially builds out
[41:11.640 --> 41:17.840]  a copy of the API based on the definitions that are in the Git repository. And that runs as a
[41:17.840 --> 41:22.520]  serverless function at the moment, a Lambda function AWS, but we're building out support
[41:22.520 --> 41:29.260]  for Azure and GCP, too. And so you've then got this API exposed, and we can have some target
[41:29.260 --> 41:34.660]  resources that we're looking to attack. Your purple team comes in or your red team comes in,
[41:34.660 --> 41:39.080]  executes some attacks against these target resources, and we have our logs shipped out to
[41:39.080 --> 41:45.620]  our scene. And so we can then verify whether when we triggered this new TTP, whether we
[41:45.620 --> 41:48.920]  have the right logs in place, whether we spotted it or not, and whether we need to tweak things
[41:48.920 --> 41:58.960]  further. So from these definitions, we define the actual test case itself. In this particular case,
[41:58.960 --> 42:03.640]  this is an AWS test case. We're looking at enumerating cloud trails, which as an attacker
[42:03.640 --> 42:07.700]  you probably want to do to see if you're being monitored or not. And so we've got a single
[42:07.700 --> 42:16.580]  line of code here, Python calling the describe trails API function on the cloud trail APIs in
[42:16.580 --> 42:23.120]  AWS. And behind the scenes, the framework abstracts away a lot of things around what identity we're
[42:23.120 --> 42:29.060]  executing this as, which region are we targeting, all these kinds of things. And so the actual
[42:29.060 --> 42:34.260]  code that's written for these test cases is pretty minimal, which obviously then makes it easier to
[42:34.260 --> 42:39.380]  write a lot of these quite fast, or make it easy for less skilled analysts to get involved in
[42:39.380 --> 42:46.940]  developing these. We also define what you need to detect the use case as part of the same definition
[42:46.940 --> 42:52.120]  file, right? So we've got both the test case and the detection case defined together. Here we've
[42:52.120 --> 42:59.980]  got a detection case that says we've got a event name of describe trails with a particular event
[42:59.980 --> 43:05.560]  source, and we're likely to find that in cloud trail audit logs. So we can then compile that,
[43:05.560 --> 43:09.220]  well, we usually compile it down to Sigma and then rely on Sigma's compilers to compile it out for
[43:09.220 --> 43:16.500]  the different SIEM solutions. But we can also easily compile Lucene queries for Elasticsearch
[43:16.500 --> 43:20.260]  or pretty much whatever you want off it. You know, the data's there, it's just a case of writing
[43:20.260 --> 43:25.100]  something to generate the right queries. And also because we try and be responsible about these
[43:25.100 --> 43:29.140]  things, we define the permissions that are necessary for Leonidas to execute a specific
[43:29.140 --> 43:35.140]  test case. And so what we actually do here then is generate the right IAM permissions for AWS,
[43:35.620 --> 43:40.220]  or the necessary permissions for your service accounts in Azure or what have you,
[43:40.220 --> 43:45.140]  in order to allow us to execute the test case as safely as possible. And we can also set
[43:45.140 --> 43:49.380]  restrictions on which resource it can target and so on to make sure you're only, you can only
[43:49.380 --> 43:54.220]  leverage it against test environments rather than your production accounts. Or we can also generate
[43:54.220 --> 43:59.480]  some documentation off the back of it. All of this is defined in a reasonably human-readable
[43:59.480 --> 44:05.400]  but not necessarily friendly file format. And so instead, what we do here is we then generate
[44:05.980 --> 44:10.560]  markdown files off the back of that and use mkdocs, for those who are familiar with it, to generate a
[44:10.560 --> 44:14.760]  pretty website, which makes it a lot easier for those who are less familiar with some of this
[44:14.760 --> 44:19.360]  stuff to understand what's actually happening, what this test case is, how it works, and ties
[44:19.360 --> 44:22.520]  it back out to the MITRE framework. So you can click on the MITRE links there and it takes you
[44:22.520 --> 44:31.720]  out to the relevant TTP there. So we can do quite a lot really with it. So in the spirit of this
[44:31.720 --> 44:39.600]  being both a presentation and sort of a tool demo, it's demo time. Let's hope this works.
[44:39.640 --> 44:47.220]  So what we have here is a case configuration file for Leo, which is essentially a Python
[44:47.220 --> 44:52.560]  script that allows you to easily talk to that API and comes with it. So we can just define some
[44:52.560 --> 44:58.200]  fairly simple test cases here that say we're going to do an AWS attack path that's going to work out
[44:58.200 --> 45:04.320]  who we are, have a look at some defensive measures that might be in place, set up some persistence,
[45:04.320 --> 45:12.420]  and then steal some secrets. So what we're going to do now is actually run that.
[45:25.660 --> 45:30.140]  Fingers crossed this actually works. Let's hope the demo gods don't bite me in the backside.
[45:39.540 --> 45:46.400]  Here we go. So we're hitting the endpoint and we've got here, we've got Leonidas hosted up
[45:46.400 --> 45:51.020]  as a lambda function behind an API gateway. So we've run a test case to work out who we are,
[45:51.020 --> 45:58.000]  and this has come back with us being Leonidas. So we've now had a look and see if we've
[45:58.000 --> 46:02.600]  got any guard duty detectors running. We have. It's not ideal, but we're going to carry on anyway.
[46:03.460 --> 46:08.700]  We just hope that we can move faster than the defenders can. So next up, we're going to
[46:08.700 --> 46:19.970]  enumerate the cloud trails that we've got running. And so this was with the idea of spotting whether
[46:19.970 --> 46:26.390]  we've got anyone who's actually monitoring us, who's going to be hunting down our attacker
[46:26.390 --> 46:34.590]  activity within that. There we go. So we've got two cloud trails running here, both of which are
[46:34.590 --> 46:38.350]  probably going to pick us up, but that's fine. We'll carry on anyway. So we've now added an IAM
[46:38.350 --> 46:44.410]  user in this particular case, Defcon user here. And that's to allow us to come back later on if
[46:44.410 --> 46:49.350]  we wanted to. Let's add an API key to it. And so we've got programmatic access. So that's come
[46:49.350 --> 46:53.810]  back with some access keys and things, which I'll be deleting straight after this talk.
[46:54.410 --> 46:57.550]  We then have a look and see what secrets we've got running in secrets manager.
[46:57.810 --> 47:01.650]  And so in this particular case, we've got a secret in here already that I created
[47:02.260 --> 47:08.350]  another time. And we're then actually going to execute the access commands to then actually dig
[47:08.350 --> 47:14.570]  out the secret value here. And you can see we've got a test API key lying around inside the secret
[47:14.570 --> 47:24.350]  string. So we just run a essentially seven-step kill chain here with a short YAML file defining
[47:24.350 --> 47:30.850]  what test cases we want to execute. And that's all just hitting an API that's running up in AWS.
[47:33.070 --> 47:41.790]  So now on top of that, we can actually just interact with it through the Swagger interface
[47:41.790 --> 47:47.750]  if we wanted to. So you can test out the test cases here if you wanted to as you're developing
[47:47.750 --> 47:52.990]  new ones. And you can run all this locally as well as in Lambda, obviously. And this is just
[47:52.990 --> 47:58.230]  really to help with development. But you've also got Swagger definitions here if you wanted to
[47:58.230 --> 48:03.530]  import that into something like Postman or Insomnia or build it into other tooling.
[48:03.830 --> 48:08.850]  And the API is pretty easy to work with. And the other thing that is worth me calling out
[48:08.850 --> 48:14.550]  quickly here is that Leonidas actually generates its own logs. So I'm just going to take a quick
[48:14.550 --> 48:27.910]  look. Let's load up the CloudWatch log streams from what we've just done. No, in fact, we won't
[48:27.910 --> 48:36.230]  be doing that. So we'll get into that in a second. But one of the really powerful things
[48:36.230 --> 48:43.650]  with this, then, we've got an API. We've got a set of things we can execute automatically.
[48:43.690 --> 48:48.890]  And we can then build that into pretty much any other security testing that we've got going on,
[48:48.890 --> 48:53.150]  especially some of the on-premise testing that we're already doing. Let's say we're using
[48:53.150 --> 48:56.970]  something like Caldera or Atomic Red Team to do on-premise simulation already. Actually,
[48:56.970 --> 49:02.830]  we can tie this together, and we can have simulations kicking off on-premise. Let's say
[49:02.830 --> 49:07.530]  simulate some phishing, a bit of lateral movement, and some Active Directory privilege escalation,
[49:07.530 --> 49:11.550]  and then pivot into the cloud from there, and then start using Leonidas to trigger
[49:11.550 --> 49:16.790]  cloud test cases off the back of that with the credentials that we've just stolen in the
[49:16.790 --> 49:21.710]  simulated attack on-premise. And to me, that's really exciting. We've been able to do
[49:21.710 --> 49:26.950]  on-premise automation for a while, and we've done a fair bit of manual stuff in the cloud like this,
[49:26.950 --> 49:32.090]  but now we can tie all that together and execute hybrid kill chains in a completely automated
[49:32.090 --> 49:41.050]  fashion. So with the log data that I mentioned, Leonidas generates its own log telemetry on
[49:41.050 --> 49:46.210]  I executed this test case, this identity, these parameters were passed in, this was the resulting
[49:46.210 --> 49:53.730]  data that came back, and that all feeds into a CloudWatch log stream in AWS with Stackdriver
[49:53.730 --> 49:58.590]  or Azure Monitor for the Azure ones. And so we can ship that out somewhere ourselves and monitor
[49:58.590 --> 50:03.110]  that separately. And one thing that strikes me as being really exciting here, or some potential for
[50:03.110 --> 50:10.470]  some really exciting work, is then integrating your seam alerts and your tagged notables, whatever,
[50:10.990 --> 50:16.990]  that are being generated by these test cases, and essentially performing automatic gap analysis,
[50:17.690 --> 50:22.570]  working out for every Leonidas test case that's fired, as denoted by the Leonidas logs that are
[50:22.570 --> 50:26.950]  coming in, did we get what we expected out the other side of it? And we can then track over time
[50:26.950 --> 50:32.990]  as we write new detection cases and tweak things, how much better we're getting, which gives us some
[50:32.990 --> 50:38.290]  nice data both to make us feel good about ourselves, but also it helps justify to whoever
[50:38.290 --> 50:42.070]  it is that's signing the paychecks, and that what we're doing is worthwhile, and that we're improving
[50:42.070 --> 50:48.590]  over time, and we can demonstrate that, and then so on and so forth. So one of the reasons I bring
[50:48.590 --> 50:52.970]  up that sort of continuous integration aspect specifically is that what we've got here is the
[50:52.970 --> 51:00.930]  ability to support us on a continuous journey. We can consistently improve our detection capability,
[51:00.930 --> 51:04.970]  write new use cases, write new test cases, embed them in Leonidas, immediately start
[51:04.970 --> 51:09.850]  simulating them alongside all the other ones we've already got, and it supports us on that
[51:09.850 --> 51:16.350]  journey. Detections are moving target, attackers are consistently coming up with new techniques,
[51:16.350 --> 51:22.690]  new mechanisms for targeting you, new tooling, so on and so forth. So we need to be able to move with
[51:22.690 --> 51:27.630]  the times as well, and being able to define all this stuff as code and automate as much of it
[51:27.630 --> 51:33.970]  as possible makes it easier to get better faster with fewer people. And so it's worth actually
[51:33.970 --> 51:40.690]  treating your detection improvement projects as a software dev project. I find that that works
[51:40.690 --> 51:46.730]  pretty well. You want to be identifying new threats and risks on a continuous basis,
[51:46.730 --> 51:51.690]  or at least taking the time to do that regularly and feeding that into some sort of design process
[51:51.690 --> 51:56.650]  to come up with new use cases for you to detect with. We then simulate that, we evaluate whether
[51:56.650 --> 52:01.630]  what we've changed as a result of that actually made some improvements or not, and we cycle
[52:01.630 --> 52:07.450]  round and round. And that's how you get better at this stuff over time. So there's a few things to
[52:07.450 --> 52:11.930]  conclude on there, I think, really. First off, actually, like the cloud brings a lot of benefits
[52:11.930 --> 52:16.950]  from a security perspective. And loads of people have been talking about that in the prevention
[52:16.950 --> 52:22.430]  space for a while. There's a lot from the detection side as well. We've got visibility
[52:22.430 --> 52:26.610]  like nothing we've ever had on premise before. If you've got a list of all your cloud accounts,
[52:26.610 --> 52:31.530]  actually, you can get all of your log data comparatively easily. And we don't end up in
[52:31.530 --> 52:35.510]  this situation where we've got some visibility of some things and not of others. And we lose
[52:35.510 --> 52:40.930]  attackers because they've pivoted off the box that they landed on and they persisted inside
[52:40.930 --> 52:45.390]  some legacy printer with some old dodgy firmware and we've got no monitoring for that.
[52:45.390 --> 52:50.730]  We don't have those kinds of problems in the cloud anymore. The problem that comes with that is
[52:50.730 --> 52:57.290]  all of this, the benefits that the cloud bring, all of the APIs and the automatability, means
[52:57.290 --> 53:01.430]  that our attackers are already automating this stuff. We were talking about the crypto mining
[53:01.430 --> 53:04.890]  earlier, but that's going to become more and more of a problem over time. We're going to start seeing
[53:04.890 --> 53:11.610]  attackers automating more complex, more interesting kill chains to go after potentially
[53:11.610 --> 53:15.830]  more damaging things. At the end of the day, Bitcoin mining is not great. You don't really
[53:15.830 --> 53:21.830]  want someone having you pay for their cryptocurrency. But it pales in comparison
[53:21.830 --> 53:28.150]  to attackers automating, like stealing loads of really important data or defacing someone's
[53:28.150 --> 53:32.450]  front page of their website or whatever else. And we're going to start seeing attackers doing that
[53:32.450 --> 53:37.750]  in the long term, I think. So we need to get better at this now. We need to be in front of this whole
[53:37.750 --> 53:43.070]  cloud detection thing from the go. And we need to be getting better at this as an industry.
[53:43.770 --> 53:51.170]  And so to go along with that, we released Leonidas, which is a tool for automating a lot
[53:51.170 --> 53:55.790]  of this stuff, both the attack simulation and the attack detection piece and write the detection
[53:55.790 --> 54:03.010]  cases. Right now, AWS only. We are not actually that far off Azure and GCP now. We're starting
[54:03.010 --> 54:08.870]  to test some of those out. There's currently 40 test cases in there, 41 for AWS. But we're
[54:08.870 --> 54:14.910]  pushing more out all the time. And you can grab it over our GitHub now, fsecurelabs.leonidas.
[54:17.830 --> 54:20.930]  So thanks for listening. And at that point, I'll take questions.
[54:20.930 --> 54:23.150]  Just one question from the audience.
[54:23.370 --> 54:29.550]  What do you recommend they use for an identity store other than cloud SSL?
[54:29.910 --> 54:37.530]  So basically, I'd recommend anything that's not Active Directory. And so we've seen a lot
[54:37.530 --> 54:46.250]  of organizations pick up things like Ping, or Okta, or Auth0, Authy. And there's a number
[54:46.250 --> 54:51.650]  of providers that offer single sign-on platforms that are designed to essentially do single sign-on
[54:51.650 --> 54:57.590]  for large cloud estates. And the reason that I recommend that is purely that it breaks that link
[54:57.590 --> 55:03.490]  between your on-premise legacy identity store and the cloud. And while that brings in a bunch of
[55:03.490 --> 55:07.550]  extra overhead from a management perspective, all of your sort of HR joiners, leavers, processes,
[55:07.550 --> 55:11.910]  all of that kind of thing, you know, it's extra steps to add to that. It also then means that
[55:11.910 --> 55:17.730]  if your Active Directory is breached, you don't get immediately owned in the cloud as well,
[55:17.730 --> 55:22.470]  which is something that, you know, in my experience working with Active Directory
[55:22.470 --> 55:26.170]  over the past few years, is something that would happen to quite a few organizations, I think.
